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Abstract. Non-linear image reconstruction and signal analysis deal with complex inverse prob- 
lems. To tackle such problems in a systematic way, I present information field theory (IFT) as a 
means of Bayesian, data based inference on spatially distributed signal fields. IFT is a statistical 
field theory, which permits the construction of optimal signal recovery algorithms even for non- 
linear and non-Gaussian signal inference problems. IFT algorithms exploit spatial correlations of 
the signal fields and benefit from techniques developed to investigate quantum and statistical field 
theories, such as Feynman diagrams, re-normalisation calculations, and thermodynamic potentials. 
The theory can be used in many areas, and applications in cosmology and numerics are presented. 
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INFORMATION FIELD THEORY 
Field inference 

A physical field is a function over some continuous space. The air temperature over 
Europe, the magnetic field within the Milky Way, or the dark matter density in the 
Universe are all fields we might want to know as accurately as possible. Fortunately, 
we have measurement devices delivering us data on these fields. But the data is always 
finite in size, whereas any field has an infinite number of degrees of freedom, the field 
values at all locations of the continuous space the field is living in. Since it is impossible 
to determine an inifinte number of unknowns from a finite number of constraints, an 
exact field reconstruction from the data alone is impossible. Additional informatiorj^is 
needed. 

Additional information might be available in form of physical laws, statistical sym- 
metries, or smoothness properties known to be obeyed by the field. A unique field re- 
construction might still be impossible, but the configuration space of possible field real- 
izatoins might be sufficently constrained to single out a good guess for the field. 

The combination of data and additional information is preferentially done in an 
information theoretically correct way by using probabilistic logic. Information field 
theory (IFT) is therefore information theory applied to fields, Bayesian reasoning with 



Information is understood here in its original and colloquial meaning to give form to the mind, or 
"Information is whatever forces a change of rational beliefs" yj. Mathematically, information theory 
is just probability theory. In some contexts, but not here, negative entropy is called information as well, 
although it is rather a measure of the amount of information than information itself. 



an infinite number of unkowns D2l|3]|. For a physicists, it is just a statistical field theory, 
as we will see, and can borrow many concepts and techniques developed for such. 
Mathematically, it deals with stochastic functions and processes and benefits from the 
theory of Gauss-, Markov-, Levy-, and other random processes. 

The main difference of IFT to the usual Bayesian inference is that the continuity 
of the physical space plays a special role. The fact that many physical fields do not 
exhibit abitrary roughness due to their causal origins implies that field values at nearby 
locations are similar, and typically more so the closer the locations are. The consequent 
exploitation of any knowledge on the field correlation structure permits us to overcome 
the ill-posedness of the field reconstruction problem. 



Path integrals 

Probabilistic reasoning requires that probability density functions (PDFs) can prop- 
erly be defined over the space of all possibilities |!4|. The configuration space of a field 
is of infinite dimensionality, since every location in space carries a field degree of free- 
dom. A little bit of thought is therefore needed on how to deal with PDFs over functional 
spaces before we can use probabilistic logic for field inference. 

Let s = {sx)x be our unknown signal field living on some physical space Q. = {x}x, e.g. 
s might be a real- or complex- valued function 5 : Q — )• R or C. The configuration space 
of s could be constructed if the set of physical locations in space would be finite, say of 
size ^ with Q. = {xi, . . . , jc ^}. Then the field values at these locations would form a 
finite-dimensional vector s = {sx^ , . . . , Sx y) = {si)'^i and the configuration space would 
be just the space of such vectors. We could then define any PDF on this vector space, like 
a signal prior ^{s). This would also permit us to calculate configuration space integrals, 
like the signal prior expectation value of any function f{s) of the discretized signal 

Now, we just have to require that the continuous limit of this discretization is possible 
yielding a path integral. This requires on the one hand that our space discretization gets 
finer everywhere with ,yt^ — )■ 0° and on the other hand that all the involved quantities 
{s, f{s), 3^{s)) behave well under this limit. The latter just implies that any reasonable 
expectation value {f{s))(s) should not depend on the the discretization resolution if the 
resolution is chosen sufficiently high. Thus, the definitions of the quantities s, f{s), and 
^{s) cannot depend on any grid specific properties and must be possible in the contiuum 
limit. We turn the last requirement into a design property: 



An information field theory is defined over continuous space. 



Space discretization can be done in a second step, if needed in order to do inference on 
a compute]]^ However, the theory shall not contain any discretization specific element. 
This distinguishes IFT from many other proposed methods for field inference, Bayesian 
or not, since these often have definitions tightly linked to specific space discretizations, 
e.g. by using concepts like pixel statistics and nearest pixel field differences. The infer- 
ence results of such methods might depend on the chosen space discretization and might 
not be resolution independent. For IFT, we require that given a sufficiently high spatial 
resolution, the solution shall not change significantly with further resolution increase or 
with a rotation of the computational grid. 

Dealing with an infinite number of degrees of freedom, we should not be surpriesed 
about mathematical objects in IFT that are infinite (e.g. configuration space volumes, 
entropies) or zero (e.g. properly normalized field PDFs) in the continuous limit. As long 
as the quantity we are interested in is well defined in the continuous limit (i.e. posterior 
mean field), we should not worry too much, since divergences of auxilliary quantities 
are well known in field theory and usually harmless. Frequently, only the well behaved 
differences or ratios of such unbound objects are of actual interest (relative entropies, 
energy differences). 

It is most instructive to see how IFT works in a concrete example. We therefore turn 
now to the simplest possible case. 



Suppose we are interested in a zero mean random field s, our signal, over continuous 
M-dimensional Euclidean space f2 = R,". The a priori field knowledge might be that the 
field is following homogeneous and isotropic Gaussian statistics. 



known from some physical considerations. E.g., the field might be the cosmic density 
field for which, given a cosmological model, the power spectrum can be calculated 
theoretically. The field s is here regarded as a vector from a function vector space (the 
configuration space of s) with the scalar product 



The determinant |5| is of course poorly defined in the continuum limit, but it is a perfectly 
sensible quantity in any finite space discretization. Since we only use |5| to ensure proper 
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(3) 



A code to handle this discretization properly is NIFTY - Numerical Information Field Theory. 



normalization of ^(5), whereas our interest is in inferring s, there is nothing to worry 
about. 

Our measured data set d — {di)i = (^2, ■ • ■) enters the game via a data model. In 
the simplest case of a linear measurement, the data is 

d = Rs + n (4) 

with Rs = f dxRixSx being the signal response and n = (niji = (ni, ...) being the 
noise. The response operator R encodes the point spread function of our instrument, 
the scanning strategy of the used telescope, and any (linear) operation done on the data, 
like a Fourier transformation in case we measure with an interferometer. The noise shall 
here also obey Gaussian zero mean statistics with known covariance A^' = (now 

with the data space scalar product n^d — J^iffidi) so that the data likelihood given the 
signal is 

^{d\s)=^(d-Rs,N). (5) 
Now the signal field posterior can be constructed via Bayes theorem, 

nsw = ^ (6) 

where we just defined the information Hamiltonian and its partition function, 

H(d,s) = -ln^(J,5) = -ln^(J|s)-ln^(5)and (7) 
Zd = j ^se'"^^^'^ = j &s^{d,s) = ^{d), (8) 

in order to translate Bayesian language into that of statistical field theory. Thus, we can 
use any technique developed for such in order to do our signal inference. 



Wiener filter 

For our specific linear and Gaussian measurement problem, the Hamiltonian 

H{d,s) = ]^{d-Rs)^N-'^{d-Rs) + ]^s^S-^s (9) 

is quadratic in s. We have dropped here irrelevant i'-independent terms, as indicated by 
" = ". This Hamiltonian can be brought into the canonical form 

H{d,s) = ]^{s-m)^D-^{s-m) (10) 

via quadratic completion, where m — Dj, D — {S~^ + R'^N~^R)~^, and j — R^N~^d. 
This implies that the signal posterior is Gaussian with mean m = and covariance 

D={{s-m) (s-m)t)(,|rf), 

^{s\d)=^{s-m,D), (11) 



a result well known in Wiener filter theory of signal reconstruction {5] . 

In a field theoretical language, the data dependent j is an information source field, 
which excites our knowledge on s being non-zero (as the preferred prior value was). The 
Wiener variance D plays two distinct roles. On the one hand it is the susceptibility of 
our mean field m to the force of the information source j, since m = Dj, on the other 
hand it describes the remaining a posteriori uncertainty D = {(s — m) {s — In 
a field theoretical language, D is the information propagator, since D^y transports the 
information source at location y to the location x of interest in = {D 7% = J dyDxyjy. 

In practice, one will use an iterative linear algebra method like the conjugate gradient 
method to solve numerically the equation D ^ m = j for m on a computer [j6J . 



Interacting theory 

Interaction Hamiltonian 

If any of the assumptions of our Wiener filter theory scenario is violated, in that the 
signal response is non-linear, the field or the noise is non-Gaussian, the noise variance 
depends on the signal, or the noise or signal covariances are unknown and have to 
be determined from the data itself, the resulting information Hamiltonian will contain 
anharmonic terms. These terms couple the different eigenmodes of the information 
propagator and lead to an interacting field theory. In many cases the Hamiltonian can 
be Taylor-Frechet expanded as 




and thereby split into a free (//free) and an interaction (//jnt) part. Let us assume that the 
interaction terms are small. This can often be achieved, i.e., by shifting the field values 
to s' = s — Sch where is the minimum of the Hamiltonian, the classical field, or in 
inference language, the maximum a posteriori estimator. Expanding H{d,s') = H{d,s = 
Sc\ + s') around = then often ensures small interaction terms around the origin. 

In this case, it is possible to expand the mean field value, or any other quantity of 
interest, around its free theory value. Since the terms of such an expansion can become 
numerous and complex, this is best done diagrammatically. 



Feynman diagrams 

Feynman diagrams provide a diagrammatical expansion to calculate perturbatively 
field expectation values. We are not explaining here how they work in detail, which 
for IFT is detailed in [|3l. We rather stress the important point that the main elements 
of the diagrams, the lines connecting source points and interaction vertices, are just 
an application of the propagator D. Since this could be done numerically for the free 



theoryAViener filter case, we are already equipped with the necessary computational 
tools to calculate more complex diagrams. For example, the mean field of an interacting 
theory might be 

w = {s){s\d) = -H <iH 0+--- 

= Dj-iDA(3)[.,D7,D7]-iDA(3)[.,D] + ..., (13) 

where we introduced A^"^[a,b, . . .] = j--j{dx\- ■ ■ dxn) Ai"Lx„ cixybx2 ■■■ as a compact 
tensor notation. The first diagram gives the Wiener filter signal reconstruction. In the 
second diagram, two Wiener filter maps are combined by the A^^) -interaction, and 
then propagated to form the first non-linear correction to the Wiener filter. In the third 
diagram, the Wiener covariance replaces the two Wiener maps of the previous diagram, 
providing a correction due to the non-linearity effects on the uncertainty structure. 
More complex diagrams might also provide significant corrections, and have then to 
be calculated too. However, their computation can always be based on the linear Wiener 
filter case of the free theory, and is therefore possible. 



Thermo dynamical inference 

A diagrammatic perturbation calculation leads to well performing algorithms in case 
the interaction terms are small. If they are large, resummation and renormalization 
techniques can be used and have proven to lead to well performing algorithms even 
for very non-linear measurement situations [3] or in cases where the signal covariance 
has to be inferred as well from the data used for the signal reconstruction [7J. 

These techniques can be complex, and the meaning of the results is not necessarily 
intuitively understood. For the treatment of highly interacting quantum field theories, 
the effective action approach has proven helpful. The effective action is the Gibbs free 
energy G known from thermodynamics (here with temperature 7 = 1), and this energy 
has the property that the map m, which minimizes it, is the desired mean field m = {s) (^|^) 
given all constraints by the data. 

The Gibbs free energy is the Legendre transformed Helmholtz free energy, which 
itself is (basically) the logarithm of the partition function Z^. If we could calculate the 
partition function, we would be able to calculate mean field reconstruction directly from 
it via derivation with respect to the information source coefficient: 

/ \ ^^'^■^d .... 

{s){s\d) = ■ (14) 

Thus, on a first sight, we did not win anything by reformulating the inference problem in 
terms of a Gibbs free energy, since this can only be calculated exactly in case we already 
have solved it. 

However, the Gibbs free energy can also be expressed in terms of the internal 
energy U = {H{d,s))^^-^ = J 0^{s\d)H{d,s) and the Boltzmann entropy 5b = 
-/ ^s^{s\d) \n^{s\d) as 

G = U-TSb. (15) 



This allows for a convenient approximative scheme, by replacing 0^{s\d) in the above 
definitions with an approximative Gaussian surrogate ^{s — m,D) (except for the Hamil- 
tonian in U), with mean m and dispersion D still to be determined. This replacement 
turns the definitions for U and 5b into Gaussian integrals, which can often be calculated 
analytically, e.g. 5b ^ ^tr(l +ln(2;rD)). 

Minimizing the resulting Gibbs free energy with respect to the unknown m and D 
gives then equations determining these quantities approximatively. This method of ther- 
modynamical inference has proven to reproduce previously found results from renor- 
malization and resummation calculations with much less effort flSl. It was also very 
useful in developing novel algorithms, e.g. to deal with the problem of reconstructing a 
Gaussian signal field where the signal covariance is unknown but spectral smoothness 
can be assumed [9| or where both the signal and the noise covariance where not known 
[fTOl . The resulting algorithm, named extended critical filter, was successfully used for a 
reconstruction of the Galactic Faraday rotation sky signal [1 IJ. 

It is interesting to note that this minimal Gibbs free energy is equivalent to a minimal 
KuUback Leibler distance ofW{s — m,D) to I^{s\d) or to Maximum Entropy for ^(5 — 
m,D) with !3^{s\d) as the prior distribution [8|. Thus information theory has basically 
reformulated methods developed earlier in thermodynamics, e.g. see [IJ. 

APPLICATIONS 

As the general theory of signal field inference, IFT has vast applications of which I want 
to mention a few listed at www . mp a-garching .mpg . de/if t[ 

Cosmic magnetism studies have already been mentioned. IFT was here used to con- 
struct Galactic Faraday rotation maps from noisy data with unreliable noise information 
[fTTI . The resulting maps can be analysed in order to test for helicity in Galactic magnetic 
fields Olllia. 

Cosmography is the 3-d cartography of the Cosmos. The main landmarks are the 
ambundant galaxies tracing the filamentary and knotty distribution of dark matter in 
space. Initial studies used Wiener filtering [HI [Ml, later the log-normal-Poisson model 
[[3l[I51[l6l[I71[l8l, whereas the latest use the evolution of the Gaussian initial conditions 
into the observed density field {W, "201 . 

Cosmic Microwave Background (CMB) studies are particularly well suited for IFT, 
since the CMB temperature statistics is very Gaussian. The weak non-Gaussianity is 
scientifically extremely interesting, since it is one of the few characteristic signatures 
of the inflationary epoch. An IFT data filter to search for such non-Gaussianity repro- 
duces already known non-Gaussianity detection methods, while transfering them into 
a Bayesian setting [|3l. Cross correlation studies of CMB and cosmic structure are also 
conveniently formulated in an IFT-language li2r[[22[|23l . 

Stochastic estimation methods are widespread in numerics. For example, the diag- 
onals and traces of complex numerical operators on high-dimensional function spaces 
(e.g. like the propagator D of IFT) are often calculated approximatively via stochas- 
tic probing. However, the real space structure of many such operator diagonals often 
exhibits sufficient smoothness that IFT methods can speed up their calculation [24]. 

Numerical simulations of partial differential equations face the problem that their 



differential operators require continuous fields to act on, but the data in computer 
memory is discrete. Thus a specific sub-grid field structure is usually assumed by 
conventional simulations schemes. IFT permits to construct the ensemble of plausible 
continuous fields being consistent with the data and other knowledge on which the 
operators can act in order to produce the time evolved field ensemble. A recast of this 
into an ensemble described by computer-data using entropic matching leads to a new 
and eventually better simulation methodology, called information field dynamics [|25l . 
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